Experimentally designed chemometric models for the assay of toxic adulterants in turmeric powder

Turmeric is an indispensable culinary spice in different cultures and a principal component in traditional remedies. Toxic metanil yellow (MY), acid orange 7 (AO) and lead chromate (LCM) are deliberately added to adulterate turmeric powder. This work compares the ability of multivariate chemometric models with those of artificial intelligent networks to enhance the selectivity of spectral data for the rapid assay of these three adulterants in turmeric powder. Using a custom experimental design, we provide a data-driven optimization for the sensitive parameters of the partial least squares model (PLS), artificial neural network (ANN) and genetic algorithm (GA). The optimized models are validated using sets of genuine turmeric samples from five different geographical regions spiked with standard adulterant concentrations. The optimized GA-PLS and GA-ANN models reduce the root mean square error of prediction by 18.4%, 31.1% and 55.3% and 25.0%, 69.9% and 88.4% for MY, AO and LCM, respectively.


Introduction
Food provides the human body with the energy necessary to function and exist, provided that safety and quality are guaranteed. The latter is sometimes perturbed deliberately through the inclusion of inferior admixtures or the exclusion of valuable ingredients to increase prot margins at the expense of customer health and social consequences. Laws, traditions and religions incriminate any form of the above practices.
Turmeric rhizome, known as Indian saffron, is one of the largest selling natural food products in the world, and receives a great deal of attention from both medical and culinary specialists. 1 Turmeric comprises more than 100 compounds. Curcumin is the main active compound and is credited with most of turmeric's health benets due to its antioxidant and anti-inammatory properties. 2 Turmeric can be used to avor or color many food substances, such as curry powder, mustard, butter and cheese. Medically, it treats several inammatory conditions, 3 metabolic syndromes, 4 inammatory degenerative eye conditions, 5 kidney and heart diseases, 3 cancer, 6 rheumatoid arthritis, 7 and several psychiatric disorders. 8 The relatively high global consumption and demand for turmeric in different applications (such as nutraceuticals, food avorings and cosmetics) make it more vulnerable to adulteration with low-quality ingredients such as starch, chalk, yellow soapstone, lead chromate and synthetic dyes. 9 The mentioned adulterants can result in cardiovascular, neurological, hepatotoxic and nephrotoxic health hazards for consumers. For instance, lead chromate (LCM) and synthetic dyes such as metanil yellow (MY) and acid orange 7 (AO) are deliberately added to mimic the color appearance of turmeric despite their hazardous health effects. 9 The electron-withdrawing character of the azo group in the synthetic dyes sometimes develops an electron deciency and is reduced to carcinogenic amino compounds. 10 Long-term consumption of metanil yellow (MY) (sodium 3-[4-anilinophenylazo]benzene sulfonate) causes severe damage to the heart and nervous tissues, 11,12 degenerative changes in the lining of the stomach, kidneys and liver, 13 as well as adversely affects the ovaries and testes. 14,15 Acid orange 7 (AO) (sodium 4-[(2E)-2-(2-oxo-naphthalene-1ylidene)hydrazinyl]benzene sulfonate) and lead chromate are deliberately added to adulterate turmeric powder. 16,17 Acid orange 7 irritates the eyes, skin, mucous membrane and upper respiratory tract in addition to causing severe headaches, nausea, water-borne diseases, such as dermatitis, and to loss of bone marrow, leading to anemia. 16 Lead chromate (LCM) can lead to lung and renal cancer, neuropathy, osteopathy, and respiratory tract toxicity. 18 Analytical chemistry guarantees food safety and quality through the development of reliable sensitive and selective analytical methods to detect and quantify food adulterants and food ingredients. [19][20][21][22] Analytical reports provide authorities, food suppliers and consumers with the evidence necessary to build condence in food authenticity, safety and quality. 23 Analyzing pharmaceutical and food products is a highly multivariate process, as many factors (parameters) may be involved during the analysis. Powerful data collection and statistical tools are required to identify the factors that may affect the results of the experiment. Design of experiment technique (DoE) is a robust statistical tool that is successfully deployed in different types of systems, product design, process development and optimization. It plays a prevalent role in decision-making processes through providing useful experimental information, and planning the type and the number of experiments to reach the optimal parameters settings. Consequently, this allows for overcoming the inability defects of generating a large amount of useful experimental data with proper human interpretation.
The DoE technique has been used extensively in several elds, such as chemistry, 24-29 agriculture, 30,31 engineering, 32,33 industry 34,35 (in particular food industry [36][37][38], and even in non-scientic aspects, such as predicting sports results, 39 especially soccer and basketball. DoE can be used in other disciplines, such as energy [40][41][42][43][44] and sensors optimization. [45][46][47][48][49] Previously, many empirical studies in science materials have been made through one factor at a time experimentation (OFAT), which provides uncorrelated material-based systems. This may be attributed to the complexity, safety-critical nature of the energy systems, the high number of components, and the processing conditions. However, DoE evaluates the contribution of different factors simultaneously and denes the needed redundancies for meaningful statistical assessment of the outcomes, allowing for the consistent establishment of energy behavioral-based strategies, a good prediction of energy yields, and integral optimization of electrochemical sensors for higher sensitivity and selectivity.
A literature survey revealed several methods to detect and/or quantify the adulteration of turmeric, such as HPLC, 50,51 HPTLC, 52,53 voltammetry, 54 multispectral imaging, 55 spectrophotometry 56 and FTIR. [57][58][59][60] However, the reported literature failed to reveal a single analytical method for the simultaneous assay of MY, AO and LCM in turmeric powder.
In the current work, an investigation and comparison were conducted on the ability of (1) traditional chemometric models, (2) articially intelligent neural networks, (3) genetic algorithm variable selection tool, and nally (4) experimentally designed optimization for discrimination and simultaneous quantitation of toxic adulterants in the complex natural matrix of turmeric powder. Once fed with a single UV-spectrum of the turmeric powder extract, the models decode the UV-absorbances to discriminate and quantify the adulterants within the complex turmeric powder. The ICH validation parameters were computed to ensure the validity of the method. The model represents a simple, direct and fast analytical tool for quality control laboratories to investigate turmeric samples, and support food suppliers and authorities with scientic evidence regarding food safety and quality.

Samples and reagents
Standard metanil yellow (purity 98.00%) and acid orange 7 (purity 98.00%) were purchased from Techno Pharmchem (India). Lead chromate standard (purity 98.00%) was purchased from OTTO (India). Genuine turmeric rhizomes (Curcuma longa L.) were purchased from the local markets in Egypt: Ragab (RG), Abu-Auf (ABU), Harraz (HZ) and Medrar (MD). In addition, rhizomes were obtained from Danube (DA) in a Saudi Arabian market. They were kindly identied by the Pharmacognosy Department, Faculty of Pharmacy, October 6 University, Cairo, Egypt. The rhizomes were ground separately to obtain the pure turmeric powder.
Methanol, ethanol, acetonitrile, acetone, acetic acid, nitric acid and sodium hydroxide (NaOH) pellets were purchased from El-NASR Pharmaceutical Chemicals Co., (Egypt). Sodium hydroxide solution (0.2 M) was prepared in distilled water and used as a solvent.

Instrumentation
The spectrophotometric measurements were carried out using a Shimadzu UV-visible spectrophotometer dual beam, model UV-1800 with a 1 cm quartz cell supplied with UV-Probe 2.32 soware (Shimadzu Scientic Instruments inc., Kyoto, Japan). All chemometric methods were implemented in MATLAB® 8.1.0.604 (R2013a) and the PLS version 2.1 toolbox. Design-Expert® 13.0.1.0 soware was used to analyze the results.

Standard solutions
Stock standard solutions (500 mg mL À1 ) of MY, AO and LCM were prepared using NaOH (0.2 M) as a solvent. 2.4.2 Preparation of the calibration and validation sets. Sets of 25 and 5 mixtures were used for the construction of the calibration set and the validation set, respectively. Each mixture contains different concentrations of the three adulterants spiked to pure turmeric powder. Accurate 20 mg weights of pure turmeric powder (from ve different sources: RG, ABU, HZ, MD and DA) were transferred separately into falcon tubes (50 mL), and spiked with standard MY (in the range 200, 400, 600, 800 and 1000 mg), standard AO (in the range 800, 1000, 1200, 1400 and 1600 mg) and standard LCM (in the range 2000, 2500, 3000, 3500 and 4000 mg) according to the multilevel multifactor design. The spiked turmeric powder was sonicated with NaOH (0.2 M, 50 mL, for 15 min), ltered, and the clear ltrate was quantitatively transferred into a volumetric ask (100 mL) and completed using the same solvent.
2.4.3 Wavelength range selection. Different wavelength ranges were sought to select the optimum range that achieves higher sensitivity and selectivity of the proposed models to avoid the noisy regions and poorly informative wavelength range.
2.4.4 Construction of the PLS models. Zero-order absorption spectra of the three adulterants (MY, AO, LCM) were recorded in the wavelength range of 200-800 nm using NaOH (0.2 M) as solvent. The wavelengths in the range of 230-570 nm were selected during the analysis, as the three analytes exhibit adequate absorbance within the working concentration ranges. Cross-validation was carried out using the leave-one-out method, and the root mean square error of cross-validation (RMSECV) was computed and used to obtain the optimum number of latent variables.
2.4.5 Variable selection using the genetic algorithm tool. The genetic algorithm (GA) parameters were congured (Table S1 †). The GA procedure was repeated several times to select the relevant wavelengths out of the 341 wavelengths in the range (230 nm-570 nm). The selected wavelengths were used to build the GA-PLS model. The model was used to determine the concentration of MY, AO and LCM.
2.4.6 Design of experiment for the optimization of the GA parameters. A two-level (À1, +1) factorial design was followed using center points to optimize the genetic algorithm parameters. Three numeric factors were manipulated, and two of them were GA parameters. The optimized parameters were the maximum number of latent variables (ml) required to build the GA model, the included tness percentage (t%), and the number of the latent variables used to build the PLS model (LV) (Table S2 †). The design included three levels for ml (3, 8 and 13), t% (50%, 70% and 90%) and LV (4, 5 and 6). The optimized parameters of the GA were used to construct the PLS model, and the results were analyzed using Design-Expert® soware. RMSEP was calculated to assess the predictive ability of the model.
2.4.7 Articial neural network. Articial neural network (ANN) is a computing system that mimics how the human brain analyzes and processes data. The optimized GA data were used to construct the articial neural network GA(DoE)-ANN. The absorbance matrix was reduced from 341 wavelengths to 123 wavelengths for the three adulterants before presenting them into the network to save the modeling time. The absorbances of the selected wavelengths (123 wavelengths) were used as inputs, while the concentration matrix of the three adulterants was used as output for the GA(DoE)-ANN model. The ANN parameters were adjusted using the Plackett-Burman design (Table  S3 †). 2.4.8 Analysis of the calibration and validation sets. The PLS, GA-PLS, GA(DoE)-PLS and GA(DoE)-ANN models were constructed to determine the concentration of each analyte in the calibration set, the recovery percent, standard deviation, relative standard deviation (% RSD), and root mean square error of calibration (RMSEC). Aerward, the developed models determined the concentration of each adulterant in the validation set mixtures. The recovery percent, standard deviation, % RSD and RMSEP were calculated.
2.4.9 Reproducibility of the models. The reproducibility of the models was tested using different concentrations of the adulterants. Three concentrations of MY (2, 4 and 10 mg mL À1 ), AO (10, 12 and 16 mg mL À1 ) and LCM (20, 25 and 40 mg mL À1 ) were analyzed three times intra-daily and on three successive days. The % RSD of the three adulterants were calculated.
2.4.10 Application. The literature failed to reveal a quantitative method to assay the three adulterants simultaneously in pure turmeric powder. According to ICH recommendations, 61 the validation was performed through the determination of the adulterant standards spiked to pure turmeric powder samples from different sources. Accurate 20 mg weights of turmeric powder were spiked with standard MY (in the range of 200 and 400 mg), AO (in the range of 800 and 1000 mg) and LCM (in the range of 2000 and 2500 mg). The mixtures were then sonicated with 50 mL NaOH (0.2 M) in a 50 mL falcon tube, and ltered. The ltrate was then quantitatively transferred into a volumetric ask (100 mL), and completed to the volume using NaOH (0.2 M).

Results and discussion
Detection of food adulteration has become an increasing concern for governments and institutions to guarantee food safety and quality. Therefore, a single, fast and cheap analytical method was developed for the detection of possible turmeric adulterants. The method should be able to assay the hazardous adulterants in different sources of turmeric powder with minimal sample preparation, only extract and measure. Thus, a simple and cheap UV-visible spectrophotometry method was a good choice to apply. The spectral data were analyzed using multivariate chemometric models. Chemometric combined with spectrophotometric techniques to recognize and assay compounds from their combined spectral data.

Spectral characteristics
The absorption spectra of the different turmeric extracts absorb light with a similar pattern, but at different extents. While the absorption spectra of the three adulterants (MY, AO, LCM) show different absorption patterns and extents from each other and turmeric powder (Fig. 1), the application of chemometric was applied for the resolution of the three adulterants in turmeric powder from their spectral data.

Multivariate analysis of different turmeric rhizomes
Five different sources of turmeric rhizomes were purchased from different markets and ground to obtain the pure turmeric powder. Then, the spiked turmeric powders were successfully analyzed using four different multivariate chemometric models, such as partial least square (PLS), genetic algorithm-partial least square (GA-PLS), optimized genetic algorithm-partial least square (GA(DoE)-PLS), and articial neural network using optimized genetic algorithm dataset (GA(DoE)-ANN), in which each source was given a specic level coded as (À2, À1, 0, 1 and 2).

Solvent selection
Many trials were carried out trying to nd the suitable solvent to freely dissolve the three adulterants. Different solvents were tried, such as methanol, ethanol, acetonitrile, acetone, acetic acid, diluted nitric acid and aqueous NaOH. The three adulterants were freely soluble in aqueous NaOH (0.2 M).

Wavelength selection
The wavelengths used were in the range of 230-570 nm, which achieved good linearity for the three adulterants. Meanwhile, the other wavelengths were discarded due to the noise appearing within the range of 200-229 nm and the poor absorbance within the range of 571-800 nm.

Construction of the models
A multilevel multifactor design was used to prepare a calibration set of 25 laboratory prepared mixtures and a validation set of 5 mixtures containing different concentration levels of the three adulterants ranging from 2-10 mg mL À1 for MY, 8-16 mg mL À1 for AO and 20-40 mg mL À1 for LCM. Each model was constructed using the optimum number of latent variables to avoid the unnecessary noise and loss of meaningful data required to build the models.

Implementation of design of experiment technique
The GA reduced the RMSEP, and the standard deviation of the results was obtained using the PLS model. The quality-bydesign principles was applied to maximize the accuracy and precision using an experimental design to optimize the parameters of the genetic algorithm-PLS model (ml, t% and LV). The latter reduced the error and increased the precision of the PLS model for the prediction of MY, AO, and LCM in the ve commercially available sources of turmeric. Finally, the resolution of the classical PLS chemometric model was compared to an articially intelligent neural network. The latter exhibited better predictability for the three adulterants in the ve commercial sources of turmeric powder.

Predictive powers of PLS and GA-PLS models
A PLS model was constructed using six latent variables (Fig. S1 †). The model successfully determined MY, AO and LCM in spiked turmeric samples within the calibration and validation sets. The mean recovery, % RSD, RMSE and other statistical parameters were calculated for each adulterant ( Table 1).
The small RMSEP values indicate good predictability and high-resolution power of the model. A good correlation coefficient (r) for each adulterant was achieved, which indicates a good t between the predicted and the actual concentrations.
Genetic algorithm tool improved the predictive power of the PLS model by selecting the most informative wavelengths and excluding the less informative ones. It reduced the number of wavelengths to about 68.9% of the original ones (106 wavelengths for the three adulterants). We constructed the GA-PLS model using the GA-selected wavelength of the calibration set. RMSECV calculations show that six latent variables are adequate for the construction of the GA-PLS model (Fig. S1 †). Unfortunately, GA did not reduce the number of latent variables compared to the previously mentioned PLS model. The RMSEP values were relatively small compared to that of the PLS model (Fig. 2), which indicates an increase in the predictive power of the GA-PLS method compared to the classical PLS method.

Optimization of GA-PLS model
Even with the preference of the GA-PLS model over the classical one, there was a necessity to optimize its predictive ability using the design of experiment technique (DoE). The DoE evaluates the impact of different factors on the desired response, and identies important interactions that are missed when experimenting with one factor at a time. The results were analyzed using one-way ANOVA (Table S4 †). The small p-values (less than 0.05) and large F-value prove the tness of the model, and its ability to determine the concentration of the three adulterants. These results indicate that there is only a 0.01% chance that an F-value this large could occur due to noise. However, the lack of Fig. 1 Zero-order absorption spectra of (A) 200 mg mL À1 turmeric extracts from different sources, (B) 10 mg mL À1 metanil yellow (-), 10 mg mL À1 acid orange 7 (---) and 10 mg mL À1 lead chromate (..).
t (p-values more than 0.05) implies that the curvature of the model is not signicant relative to the pure error, which could occur due to noise. Non-signicant lack of t is desirable as we want the model to t, which means that the curvature is not signicant. The relatively small difference between the adjusted and predicted R 2 indicates good predictability of the model. Adequate precision measures the signal-to-noise ratio. A ratio greater than 4 is achieved, which indicates an adequate signal and proves that the model can be used to navigate the design space. Residuals versus predicted data plots (Fig. S2 †) indicated the good tness of the model as the residuals (errors) are within the specied limits. No signicant interaction between the factors was observed. The desirability function selected the best level of each factor, which was then used to optimize the predictive ability of the GA-PLS model. The function suggested using ml, t% and LV of 10, 82% and 5, respectively (Fig. S3 †), with 97.60% desirability.
The suggested ml was applied to GA parameters, while the suggested t% was used to select the optimum set of wavelengths. The PLS model was constructed using the calibration set and the optimum number of LV (Fig. S1 †). The smaller RMSEP of the GA(DoE)-PLS model relative to that of the GA-PLS model suggests better predictive ability of the former model (Fig. 2).

Optimization of ANN model
The ANN model was constructed using the GA (DoE) dataset. Different numbers of hidden neurons were tested to select the optimum number of neurons that improved the ANN predictive ability. The optimum number of hidden neurons was seven, which gave a small MSE and correlation coefficient r near unity (Fig. 3). The ANN architecture indicated the different layers used for the prediction of the concentration of the three adulterants (Fig. S4 †). Purelin-Purelin transfer function was suitable owing to the linear absorbances-concentrations relationship. The TRAINLM-Levenberg-Marquardt backpropagation was preferred as a training function. The GA(DoE)-ANN model was performed on the validation set, and RMSEP was calculated.

Model reproducibility and data prediction
The four models successfully predicted the concentration of the three analytes in the calibration and validation sets (Table 1). Reproducible and precise results (% RSD) were obtained for repeatability and intermediate precision, indicating the success of the four models in analyzing the three adulterants ( Table 1). The GA(DoE)-PLS and GA(DoE)-ANN models achieved the best accuracy (low RMSEP) and precision (low % RSD) compared to the other models ( Fig. S5 and S6 †), owing to the useful GA variable selection tool, the effective DoE optimization approach, and the powerful ANN articial intelligence (Fig. 2).

Validation of the models
Few methods were reported for the assay of MY in turmeric powder. However, no method was reported for the simultaneous assay of the three adulterants in turmeric powder. Accordingly, we validated the developed chemometric models as per the ICH guidelines. 61 This was achieved through spiking of the pure turmeric powder with known concentrations of the three adulterants, and the ability of the proposed models to recover the concentration of the spiked adulterants was then assessed. The recovery percentages and the relative standard deviation percentages were calculated (Table 2). The work demonstrated the augmented inuence of the GA and the experimental design on the predictive ability of the PLS and ANN models. The four chemometric methods were validated as per the ICH guidelines. The developed methods proved to be fast, cheap and involved minimal sample preparation. The methods can be used for fast surveillance of turmeric powder adulteration in quality control laboratories before including the powder in food recipes to guarantee authenticity, safety and quality.